-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #71 rounding #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great article @kaz462 - I learned a lot about different ways to round 😄 I left some review comments.
repo_spec = "pharmaverse/blog", | ||
name = long_slug | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor thing, and perhaps you don't want to do this, but I thought I would mention anyway - would it be worth adding a section looking a the bigger picture, and whether how far we should be trying to go to match SAS rounding? I'm thinking about the fact that if we are trying to move to R-based submissions, then some of this stuff should in a way be expected and accepted. My idea comes from this comment by @rossfarrugia .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when moving from SAS to R submission, what kind of discrepancy are expected/accepted for rounding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to add context to my comment before... when using same language for double programming QC often teams would strive for a 100% perfect comparison match, whereas when working in a multi-lingual world maybe it's ok to not perfectly match if differences can be explained in audit-ready QC evidence to explain differences are down to the use of different languages. e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.
This discrepancy is not caused by SAS/R, but by different rounding approaches, which I think should be based on the study SAP instead of the software. Should the rounding method be clarified first, then choose the corresponding functions in SAS/R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd doubt most SAPs would go into such detail as nobody really ever questioned rounding approaches when we only used single language. It's a fair point though. I feel like a closing note such as "Note: with the differences in default behaviour across languages, you could consider your QC strategy and whether an acceptable level of fuzz in the electronic comparisons could be allowed for cases such as rounding when making comparisons between 2 codes written in different languages as long as this is documented. Alternatively you could document the exact rounding approach to be used in the SAP and then match this regardless of programming language used."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good summary, I will add it next week after the holiday.
The reason I prefer to align with the rounding method during QC is that when we accept the difference, it's easy to overlook the differences caused by other reasons., e.g., RConsortium/submissions-pilot3-adam#99
Thanks @rossfarrugia @clarkliming for your comments and discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg this rounding numbers graphic is amazing!!
I think we might want to mention SAS's |
The post just sort of ends abruptly - anyway we could tie it up with your current experiences/summary of the post? This one is going to be awesome!!! |
what do you think of changing "Bug Fix" to "Numerical precision issue"? |
@manciniedoardo @bms63 Thanks so much for your review and comments! I have limited knowledge on computational precision and floating-point, so I didn't go too deep on this topic, let me know if everything makes sense by referencing multiple resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think option 2 for the table would be helpful, but I'm okay if it doesn't happen.
The discussion on a clean compares between two languages is interesting. Maybe a follow up post?
Perhaps leave them with a question at the end of the post. How concerned should we be if a programs in R and and SAS are off ...
Good idea on the open-ended question. The following example illustrate the "safe" options I mentioned in the last section doesn't work as expected, but I'm not sure if this will ever happen in our daily work. In this example, > a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
> janitor::round_half_up(a, digits = 0)
[1] 2 If it happens, maybe we could argue > all.equal(a, 1.5)
[1] TRUE @manciniedoardo @rossfarrugia @bms63 depends on the situation, maybe discrepancies like this example can be accepted? |
After the fix: | ||
|
||
```{r, message = FALSE} | ||
# revised rounds half up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, I don't think we should do this. this will change a lot of behaviors, like ut_round1(2436.84499999999999, 2)
gives 2436.85 while round(2436.84499999999999, 2)
gives 2436.84 (which is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkliming Thanks for the review! your example is a similar case to the one mentioned above: a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
The reason for this behavior is that 2436.845 - 2436.84499999999999
is less than sqrt(.Machine$double.eps)
, the adjustment in the function is not big enough to round it up. When we encounter this situation, maybe we could handle it by the following options -
-
accept this value and explain:
2436.845
and2436.84499999999999
are nearly equal by usingall.equal()
function from base R with defaulttolerance
value (defaulttolerance = sqrt(.Machine$double.eps)
- the output from SAS
round(2436.84499999999999, 0.01)
is also2436.85
- the output from SAS
-
choose a smaller value than
sqrt(.Machine$double.eps)
or remove it to adjust this function
How would you do "round half up" in R if we don't choose those functions (ut_round1
, janitor::round_half_up
, ...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this function will be used deeply inside other functions and large-scale used, so controlling the tolerance manually is not likely(which means that users can only apply one tolerance to all the numeric results obtained).
I will split your question into two: one is how do we deal with round half up, the other is how do we deal with rounding error.
My response to these two questions is the same: no action is needed.
For question 1, R is following IEEE 754, with default bankers rounding. What SAS provided seems reasonable, however, R is not wrong. This is just a different method. R does not need to copy SAS.
For question 2, about the accuracy loss in computation, I will also do nothing. This is an issue by design, and not possible to perfectly solve at the moment.
Both issues won't affect the interpretation of the result
So for me, introducing the fact that R and SAS are different in rounding should be sufficient.
Come back to the issue. The function ut_round1
tries to solve the rounding error arising from the design, but it did not. So it is important to also mention this point. (also, this issue is not possible to solve in the current design, this fix only fixes one part of the issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not trying to claim that base R's round function is wrong or we should strictly mimic SAS's behavior, but to provide round half up options in R. Similar as the intention of providing rounding option in Tplyr, offering users more flexibility - there may be contexts where 'round half up' is required.
What do you think of using the following examples to highlight those round_half_up functions do not offer the same level of precision and accuracy as the base R round function?
Please feel free to update this post directly, or let me know how to clarify this point :)
> a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
> b <- 1.5 - 0.5*.Machine$double.eps
> janitor::round_half_up(a)
[1] 2
> janitor::round_half_up(b)
[1] 2
> round(a)
[1] 1
# base round reaches the precision limit:
> round(b)
[1] 2
# Conflicts: # inst/WORDLIST.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all good for me! @rossfarrugia @StefanThoma any last concerns or can we merge in and publish?
Fine from my side 👍 |
No description provided.